Linear Algebra
Theorem. Every finite dimensional Hilbert space has scalar field either \R or \C.
Question. Is this true? What about \R[X] or quaternions?
Definition. Given a symmetric positive semi-definite matrix K ∈ \R^{n × n}, the Mahalanobis norm \norm{\dummyarg}_K is
\norm{\vec x}_K ≜ \sqrt{\vec x^T ⋅ K ⋅ \vec x}
Definition. Given a symmetric positive semi-definite matrix K ∈ \R^{n × n}, a whitening transform is a matrix W ∈ \R^{n × n} such that W ⋅ K ⋅ W^T = I.
Note. There are many solutions W that satisfy the above. In particular given any whitening matrix W and unitary matrix U ∈ \R^{n × n} the matrix U ⋅ W is also a whitening matrix.
Definition. The Mahalanobis whitening matrix is W = K^{- \frac 12}.
Definition. The Cholesky transform is W = L^T where L^T ⋅ L = K^{-1} is the Cholesky decomposition.
Definition. The ZCA-Cor transform is W = P^{-\frac 12} ⋅ V^{- \frac 12} where V is the diagonal of K and P = V^{-\frac 12} ⋅ K ⋅ V^{-\frac 12}.
Theorem. ZCA-Cor maximizes the correlation between the original an whitened values. See KLS16.
To do. Proof that this actually is a norm. Explain https://en.wikipedia.org/wiki/Whitening_transformation and relationship to Euclidean norm.
To do. The whitening matrix is not unqiue see here for two solutions.
Note. When Ω = I the Mahalanobis norm equals the Euclidean norm.
To do. Summarize SVD and how it relates to almost everything else.
To do. Discuss the Generalized SVD GSVD and its (potentia) application here.
To do. Discuss Moore-Penrose pseudoinverse.
https://en.wikipedia.org/wiki/Jacobian_matrix_and_determinant
https://en.wikipedia.org/wiki/Hessian_matrix
Norms and means
Definition. Given p ∈ \R ∪ \set{-∞, +∞} The p-norm is
\norm{\vec x}_p ≜ \p{\sum_i \abs{x_i} ^p}^{\frac{1}{p}}
Note. The limiting cases \set{-∞, 0, +∞} are not real norms.
Note. The special case p=1 sL1-norm is also known as the *Taxicab norm, Manhattan norm, or in the case of one dimension, the Absolute-value norm.
Note. The limiting case of a 0-norm is the Hamming weight. The corresponding distance is the Hamming distance.
Note. The 2-norm is also known as the Euclidean norm.
Note. The +∞-norm is also known as the ∞-norm, the infinity norm or the maximum norm.
To do. What happens with the negative norms?
Definition. Each norm has an associated distance.
d(\vec a, \vec b) = \norm{a - b}
Note. Each distance has an associated nearest scalar value in the following sence:
s = \arg \min_s d\p{\vec x, s ⋅ \vec 1}
Note. The summary statistic from the 0-norm is called the mode and is not unique.
Note. The summary statistic from the 1-norm is called the median.
Note. The summary statistic from the 2-norm is called the mean.
https://www.johnmyleswhite.com/notebook/2013/03/22/modes-medians-and-means-an-unifying-perspective/
https://www.johnmyleswhite.com/notebook/2013/03/22/using-norms-to-understand-linear-regression/
Definition. For values \vec x ∈ \R^n the generalized mean is
\p{\frac{1}{n} \sum_i x_i^p}^{\frac{1}{p}}
https://en.wikipedia.org/wiki/Lehmer_mean
https://en.wikipedia.org/wiki/Quasi-arithmetic_mean
Note. The -∞-norm is also known as the minimum norm?
https://www.johnmyleswhite.com/notebook/2014/03/24/a-note-on-the-johnson-lindenstrauss-lemma/
Matrix Calculus
\pder{\vec x} \vec y
References
- 
KLS16 Agnan Kessy, Alex Lewin, Korbinian Strimmer (2016) “Optimal whitening and decorrelation” https://arxiv.org/abs/1512.00809 
- 
Terrence Parr, Jeremy Howards (2018). “The Matrix Calculus You Need For Deep Learning” https://arxiv.org/abs/1802.01528 https://explained.ai/matrix-calculus/ 
